US Police Killings¶

The data set represent shootings of civilians by police in the US. It contains information on each police killing in the US from January 2015 to June 2015.

The goal is to investigates on these shootings.

In [2]:

import pandas as pd
police_killings = pd.read_csv("police_killings.csv", encoding="ISO-8859-1")
police_killings.head(5)

Out[2]:

	name	age	gender	raceethnicity	month	day	year	streetaddress	city	state	...	share_hispanic	p_income	h_income	county_income	comp_income	county_bucket	nat_bucket	pov	urate	college
0	A'donte Washington	16	Male	Black	February	23	2015	Clearview Ln	Millbrook	AL	...	5.6	28375	51367.0	54766	0.937936	3.0	3.0	14.1	0.097686	0.168510
1	Aaron Rutledge	27	Male	White	April	2	2015	300 block Iris Park Dr	Pineville	LA	...	0.5	14678	27972.0	40930	0.683411	2.0	1.0	28.8	0.065724	0.111402
2	Aaron Siler	26	Male	White	March	14	2015	22nd Ave and 56th St	Kenosha	WI	...	16.8	25286	45365.0	54930	0.825869	2.0	3.0	14.6	0.166293	0.147312
3	Aaron Valdez	25	Male	Hispanic/Latino	March	11	2015	3000 Seminole Ave	South Gate	CA	...	98.8	17194	48295.0	55909	0.863814	3.0	3.0	11.7	0.124827	0.050133
4	Adam Jovicic	29	Male	White	March	19	2015	364 Hiwood Ave	Munroe Falls	OH	...	1.7	33954	68785.0	49669	1.384868	5.0	4.0	1.9	0.063550	0.403954

5 rows × 34 columns

In [3]:

police_killings.columns

Out[3]:

Index(['name', 'age', 'gender', 'raceethnicity', 'month', 'day', 'year',
       'streetaddress', 'city', 'state', 'latitude', 'longitude', 'state_fp',
       'county_fp', 'tract_ce', 'geo_id', 'county_id', 'namelsad',
       'lawenforcementagency', 'cause', 'armed', 'pop', 'share_white',
       'share_black', 'share_hispanic', 'p_income', 'h_income',
       'county_income', 'comp_income', 'county_bucket', 'nat_bucket', 'pov',
       'urate', 'college'],
      dtype='object')

In [4]:

count_race = police_killings["raceethnicity"].value_counts()

In [5]:

%matplotlib inline
import matplotlib.pyplot as plt

Shooting by Race¶

In [6]:

plt.bar(range(6), count_race.values)
plt.xticks(range(6), count_race.index, rotation="vertical")
plt.show()

In [7]:

count_race / sum(count_race)

Out[7]:

White                     0.505353
Black                     0.289079
Hispanic/Latino           0.143469
Unknown                   0.032120
Asian/Pacific Islander    0.021413
Native American           0.008565
Name: raceethnicity, dtype: float64

Shootings By Regional Income¶

In [8]:

income = police_killings["p_income"][police_killings["p_income"] != '-'].astype('int')
plt.hist(income,bins=30)
plt.show()

In [21]:

police_killings["p_income"][police_killings["p_income"] != '-'].astype('int').median()

Out[21]:

22348.0

According to the Census, median personal income in the US is 28,567, and our median is 22,348, which means that shootings tend to happen in less affluent areas. Our sample size is relatively small, though, so it's hard to make conclusions.

Shootings By State¶

In [10]:

state_pop = pd.read_csv("state_population.csv")

In [11]:

counts = police_killings["state_fp"].value_counts()
#counts : Pandas Series, where the index is the code for each state, 
#and the values are the numbers of police killings in each state.

In [12]:

states = pd.DataFrame({"STATE": counts.index, "shootings": counts})
states = state_pop.merge(states, on = "STATE")
# STATE is the common column that both states and state_pop share.

In [13]:

states["pop_millions"] = states["POPESTIMATE2015"]/1000000

In [14]:

states["rate"] = states["shootings"]/states["pop_millions"]

In [15]:

states.sort("rate")

/dataquest/system/env/python3/lib/python3.4/site-packages/ipykernel/__main__.py:1: FutureWarning: sort(columns=....) is deprecated, use sort_values(by=.....)
  if __name__ == '__main__':

Out[15]:

	SUMLEV	REGION	DIVISION	STATE	NAME	POPESTIMATE2015	POPEST18PLUS2015	PCNT_POPEST18PLUS	shootings	pop_millions	rate
6	40	1	1	9	Connecticut	3590886	2826827	78.7	1	3.590886	0.278483
37	40	1	2	42	Pennsylvania	12802503	10112229	79.0	7	12.802503	0.546768
15	40	2	4	19	Iowa	3123899	2395103	76.7	2	3.123899	0.640226
32	40	1	2	36	New York	19795791	15584974	78.7	13	19.795791	0.656705
21	40	1	1	25	Massachusetts	6794422	5407335	79.6	5	6.794422	0.735898
29	40	1	1	33	New Hampshire	1330608	1066610	80.2	1	1.330608	0.751536
19	40	1	1	23	Maine	1329328	1072948	80.7	1	1.329328	0.752260
13	40	2	3	17	Illinois	12859995	9901322	77.0	11	12.859995	0.855366
34	40	2	3	39	Ohio	11613423	8984946	77.4	10	11.613423	0.861073
45	40	2	3	55	Wisconsin	5771337	4476711	77.6	5	5.771337	0.866350
22	40	2	3	26	Michigan	9922576	7715272	77.8	9	9.922576	0.907023
39	40	3	6	47	Tennessee	6600299	5102688	77.3	6	6.600299	0.909050
33	40	3	5	37	North Carolina	10042802	7752234	77.2	10	10.042802	0.995738
28	40	4	8	32	Nevada	2890845	2221681	76.9	3	2.890845	1.037759
42	40	3	5	51	Virginia	8382993	6512571	77.7	9	8.382993	1.073602
44	40	3	5	54	West Virginia	1844128	1464532	79.4	2	1.844128	1.084523
23	40	2	4	27	Minnesota	5489594	4205207	76.6	6	5.489594	1.092977
14	40	2	3	18	Indiana	6619680	5040224	76.1	8	6.619680	1.208518
30	40	1	2	34	New Jersey	8958013	6959192	77.7	11	8.958013	1.227951
3	40	3	7	5	Arkansas	2978204	2272904	76.3	4	2.978204	1.343091
9	40	3	5	12	Florida	20271272	16166143	79.7	29	20.271272	1.430596
8	40	3	5	11	District of Columbia	672228	554121	82.4	1	0.672228	1.487591
43	40	4	9	53	Washington	7170351	5558509	77.5	11	7.170351	1.534095
10	40	3	5	13	Georgia	10214860	7710688	75.5	16	10.214860	1.566346
17	40	3	6	21	Kentucky	4425092	3413425	77.1	7	4.425092	1.581888
25	40	2	4	29	Missouri	6083672	4692196	77.1	10	6.083672	1.643744
0	40	3	6	1	Alabama	4858979	3755483	77.3	8	4.858979	1.646436
20	40	3	5	24	Maryland	6006401	4658175	77.6	10	6.006401	1.664891
41	40	4	8	49	Utah	2995919	2083423	69.5	5	2.995919	1.668937
46	40	4	8	56	Wyoming	586107	447212	76.3	1	0.586107	1.706173
40	40	3	7	48	Texas	27469114	20257343	73.7	47	27.469114	1.711013
38	40	3	5	45	South Carolina	4896146	3804558	77.7	9	4.896146	1.838180
4	40	4	9	6	California	39144818	30023902	76.7	74	39.144818	1.890416
26	40	4	8	30	Montana	1032949	806529	78.1	2	1.032949	1.936204
36	40	4	9	41	Oregon	4028977	3166121	78.6	8	4.028977	1.985616
24	40	3	6	28	Mississippi	2992333	2265485	75.7	6	2.992333	2.005124
16	40	2	4	20	Kansas	2911641	2192084	75.3	6	2.911641	2.060694
7	40	3	5	10	Delaware	945934	741548	78.4	2	0.945934	2.114312
5	40	4	8	8	Colorado	5456574	4199509	77.0	12	5.456574	2.199182
18	40	3	7	22	Louisiana	4670724	3555911	76.1	11	4.670724	2.355095
31	40	4	8	35	New Mexico	2085109	1588201	76.2	5	2.085109	2.397956
12	40	4	8	16	Idaho	1654930	1222093	73.8	4	1.654930	2.417021
1	40	4	9	2	Alaska	738432	552166	74.8	2	0.738432	2.708442
11	40	4	9	15	Hawaii	1431603	1120770	78.3	4	1.431603	2.794071
27	40	2	4	31	Nebraska	1896190	1425853	75.2	6	1.896190	3.164240
2	40	4	8	4	Arizona	6828065	5205215	76.2	25	6.828065	3.661359
35	40	3	7	40	Oklahoma	3911338	2950017	75.4	22	3.911338	5.624674

States in the midwest and south seem to have the highest police killing rates, whereas those in the northeast seem to have the lowest.

State By State Differences¶

Dive more in the data in order to explain differerences in police killing rate.¶

In [36]:

pk = police_killings[(police_killings["share_white"] != "-")
                     & (police_killings["share_black"] != "-")
                     & (police_killings["share_hispanic"] != "-")]

pk["share_white"] = pk["share_white"].astype('float')
pk["share_black"] = pk["share_black"].astype('float')
pk["share_hispanic"] = pk["share_hispanic"].astype('float')

/dataquest/system/env/python3/lib/python3.4/site-packages/ipykernel/__main__.py:5: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
/dataquest/system/env/python3/lib/python3.4/site-packages/ipykernel/__main__.py:6: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
/dataquest/system/env/python3/lib/python3.4/site-packages/ipykernel/__main__.py:7: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

In [56]:

lowest_states = ["CT", "PA", "IA", "NY", "MA", "NH", "ME", "IL", "OH", "WI"]
highest_states = ["OK", "AZ", "NE", "HI", "AK", "ID", "NM", "LA", "CO", "DE"]

ls = pk[pk["state"].isin(lowest_states)]
hs = pk[pk["state"].isin(highest_states)]

Mean of the Lowest Shooting Rate¶

In [65]:

ls[["pop", "county_income",
    "share_white", "share_black", "share_hispanic"]].mean()

Out[65]:

pop                4201.660714
county_income     54830.839286
share_white          60.616071
share_black          21.257143
share_hispanic       12.948214
dtype: float64

Mean of the Highest Shooting Rate¶

In [66]:

hs[["pop", "county_income",
    "share_white", "share_black", "share_hispanic"]].mean()

Out[66]:

pop                4315.750000
county_income     48706.967391
share_white          55.652174
share_black          11.532609
share_hispanic       20.693478
dtype: float64

It looks like the states with low rates of shootings tend to have a higher proportion of blacks in the population, and a lower proportion of hispanics in the census regions where the shootings occur. It looks like the income of the counties where the shootings occur is higher.

States with high rates of shootings tend to have high hispanic population shares in the counties where shootings occur.

In [67]:

hs[["pop", "county_income",
    "share_white", "share_black", "share_hispanic"]].describe()

Out[67]:

	pop	county_income	share_white	share_black	share_hispanic
count	92.000000	92.000000	92.000000	92.000000	92.000000
mean	4315.750000	48706.967391	55.652174	11.532609	20.693478
std	2063.723609	9839.206872	24.406158	19.591303	20.415690
min	403.000000	25498.000000	2.100000	0.000000	0.000000
25%	2886.000000	42987.000000	39.175000	0.675000	4.350000
50%	4257.500000	48801.000000	58.200000	2.700000	10.850000
75%	5377.000000	53596.000000	74.200000	11.550000	31.725000
max	13561.000000	77454.000000	95.900000	93.100000	81.500000